{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ZwZNOAMZcxl3"
      },
      "source": [
        "##### Copyright 2019 The TensorFlow Neural Structured Learning Authors"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "cellView": "form",
        "id": "nxbcnXODdE06"
      },
      "outputs": [],
      "source": [
        "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n",
        "# you may not use this file except in compliance with the License.\n",
        "# You may obtain a copy of the License at\n",
        "#\n",
        "#     https://www.apache.org/licenses/LICENSE-2.0\n",
        "#\n",
        "# Unless required by applicable law or agreed to in writing, software\n",
        "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
        "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
        "# See the License for the specific language governing permissions and\n",
        "# limitations under the License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "-BszoQj0dSZO"
      },
      "source": [
        "# Adversarial regularization for image classification"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "wfqlePz0g6o5"
      },
      "source": [
        "\u003ctable class=\"tfo-notebook-buttons\" align=\"left\"\u003e\n",
        "  \u003ctd\u003e\n",
        "    \u003ca target=\"_blank\" href=\"https://www.tensorflow.org/neural_structured_learning/tutorials/adversarial_keras_cnn_mnist\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/tf_logo_32px.png\" /\u003eView on TensorFlow.org\u003c/a\u003e\n",
        "  \u003c/td\u003e\n",
        "  \u003ctd\u003e\n",
        "    \u003ca target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/neural-structured-learning/blob/master/g3doc/tutorials/adversarial_keras_cnn_mnist.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" /\u003eRun in Google Colab\u003c/a\u003e\n",
        "  \u003c/td\u003e\n",
        "  \u003ctd\u003e\n",
        "    \u003ca target=\"_blank\" href=\"https://github.com/tensorflow/neural-structured-learning/blob/master/g3doc/tutorials/adversarial_keras_cnn_mnist.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" /\u003eView source on GitHub\u003c/a\u003e\n",
        "  \u003c/td\u003e\n",
        "  \u003ctd\u003e\n",
        "    \u003ca href=\"https://storage.googleapis.com/tensorflow_docs/neural-structured-learning/g3doc/tutorials/adversarial_keras_cnn_mnist.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/download_logo_32px.png\" /\u003eDownload notebook\u003c/a\u003e\n",
        "  \u003c/td\u003e\n",
        "\u003c/table\u003e"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "oHEGl8h_m6tS"
      },
      "source": [
        "## Overview\n",
        "\n",
        "In this tutorial, we will explore the use of adversarial learning\n",
        "([Goodfellow et al., 2014](https://arxiv.org/abs/1412.6572)) for image\n",
        "classification using the Neural Structured Learning (NSL) framework.\n",
        "\n",
        "The core idea of adversarial learning is to train a model with adversarially-perturbed data (called adversarial examples) in addition to the organic training data. To the human eye, these adversarial examples look the same as the original but the perturbation will cause the model to be confused and make incorrect predictions or classifications. The adversarial examples are constructed to intentionally mislead the model into making wrong predictions or classifications. By training with such examples, the model learns to be robust against adversarial perturbation when making predictions.\n",
        "\n",
        "In this tutorial, we illustrate the following procedure of applying adversarial\n",
        "learning to obtain robust models using the Neural Structured Learning framework:\n",
        "\n",
        "1.  Create a neural network as a base model. In this tutorial, the base model is\n",
        "    created with the `tf.keras` functional API; this procedure is compatible\n",
        "    with models created by `tf.keras` sequential and subclassing APIs as well.\n",
        "    For more information on Keras models in TensorFlow, see this [documentation](https://www.tensorflow.org/api_docs/python/tf/keras/Model).\n",
        "2.  Wrap the base model with the **`AdversarialRegularization`** wrapper class,\n",
        "    which is provided by the NSL framework, to create a new `tf.keras.Model`\n",
        "    instance. This new model will include the adversarial loss as a\n",
        "    regularization term in its training objective.\n",
        "3.  Convert examples in the training data to feature dictionaries.\n",
        "4.  Train and evaluate the new model."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "dZEDFUhqn42r"
      },
      "source": [
        "## Recap for Beginners\n",
        "\n",
        "\n",
        "\n",
        "There is a corresponding [video explanation](https://youtu.be/Js2WJkhdU7k) on adversarial learning for image classification part of the TensorFlow Neural Structured Learning Youtube series. Below, we have summarized the key concepts explained in this video, expanding on the explanation provided in the Overview section above.\n",
        "\n",
        "The NSL framework jointly optimizes both image features and structured signals to help neural networks better learn. However, what if there is no explicit structure available to train the neural network? This tutorial explains one approach involving the creation of adversarial neighbors (modified from the original sample) to dynamically construct a structure.\n",
        "\n",
        "Firstly, adversarial neighbors are defined as modified versions of the sample image applied with small perturbations that mislead a neural net into outputting inaccurate classifications. These carefully designed perturbations are typically based on the reverse gradient direction and are meant to confuse the neural net during training. Humans may not be able to tell the difference between a sample image and it's generated adversarial neighbor. However, to the neural net, the applied perturbations are effective at leading to an inaccurate conclusion. \n",
        "\n",
        "Generated adversarial neighbors are then connected to the sample, therefore dynamically constructing a structure edge by edge. Using this connection, neural nets learn to maintain the similarities between the sample and the adversarial neighbors while avoiding confusion resulting from misclassifications, thus improving the overall neural network's quality and accuracy. \n",
        "\n",
        "The code segment below is a high-level explanation of the steps involved while the rest of this tutorial goes into further depth and technicality.\n",
        "\n",
        "1. Read and prepare the data. Load the MNIST dataset and normalize the feature values to stay in the range [0,1]\n",
        "```\n",
        "import neural_structured_learning as nsl\n",
        "\n",
        "(x_train, y_train), (x_train, y_train) = tf.keras.datasets.mnist.load_data()\n",
        "x_train, x_test = x_train / 255.0, x_test / 255.0\n",
        "```"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "WSlSGafKn42s"
      },
      "source": [
        "2. Build the neural network. A Sequential Keras base model is used for this example.\n",
        "```\n",
        "model = tf.keras.Sequential(...)\n",
        "```\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "wFJ6cixdn42s"
      },
      "source": [
        "3. Configure the adversarial model. Including the hyperparameters: multiplier applied on the adversarial regularization, empirically chosen differ values for step size/learning rate. Invoke adversarial regularization with a wrapper class around the constructed neural network.\n",
        "```\n",
        "adv_config = nsl.configs.make_adv_reg_config(multiplier=0.2, adv_step_size=0.05)\n",
        "adv_model = nsl.keras.AdversarialRegularization(model, adv_config)\n",
        "```"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "6ohmfLgLn42s"
      },
      "source": [
        "4. Conclude with the standard Keras workflow: compile, fit, evaluate.\n",
        "```\n",
        "adv_model.compile(optimizer='adam', loss='sparse_categorizal_crossentropy', metrics=['accuracy'])\n",
        "adv_model.fit({'feature': x_train, 'label': y_train}, epochs=5)\n",
        "adv_model.evaluate({'feature': x_test, 'label': y_test})\n",
        "```"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "VgSOF-49Q7kS"
      },
      "source": [
        "What you see here is adversarial learning enabled in 2 steps and 3 simple lines of code. This is the simplicity of the neural structured learning framework. In the following sections, we expand upon this procedure."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "qODwGDl-n42t"
      },
      "source": [
        "## Setup"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "4RhmgQ7-mlrl"
      },
      "source": [
        "Install the Neural Structured Learning package."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "ByJ7133BQULR"
      },
      "outputs": [],
      "source": [
        "!pip install --quiet neural-structured-learning"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "PZvsEQrhSqKx"
      },
      "source": [
        "Import libraries. We abbreviate `neural_structured_learning` to `nsl`."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "EuqEuAYzTMo0"
      },
      "outputs": [],
      "source": [
        "import matplotlib.pyplot as plt\n",
        "import neural_structured_learning as nsl\n",
        "import numpy as np\n",
        "import tensorflow as tf\n",
        "import tensorflow_datasets as tfds"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "3LwBtQGaTvbe"
      },
      "source": [
        "## Hyperparameters\n",
        "\n",
        "We collect and explain the hyperparameters (in an `HParams` object) for model\n",
        "training and evaluation.\n",
        "\n",
        "Input/Output:\n",
        "\n",
        "*   **`input_shape`**: The shape of the input tensor. Each image is 28-by-28\n",
        "pixels with 1 channel.\n",
        "*   **`num_classes`**: There are a total of 10 classes, corresponding to 10\n",
        "digits [0-9].\n",
        "\n",
        "Model architecture:\n",
        "\n",
        "*   **`conv_filters`**: A list of numbers, each specifying the number of\n",
        "filters in a convolutional layer.\n",
        "*   **`kernel_size`**: The size of 2D convolution window, shared by all\n",
        "convolutional layers.\n",
        "*   **`pool_size`**: Factors to downscale the image in each max-pooling layer.\n",
        "*   **`num_fc_units`**: The number of units (i.e., width) of each\n",
        "fully-connected layer.\n",
        "\n",
        "Training and evaluation:\n",
        "\n",
        "*  **`batch_size`**: Batch size used for training and evaluation.\n",
        "*  **`epochs`**: The number of training epochs.\n",
        "\n",
        "Adversarial learning:\n",
        "\n",
        "*   **`adv_multiplier`**: The weight of adversarial loss in the training\n",
        "objective, relative to the labeled loss.\n",
        "*   **`adv_step_size`**: The magnitude of adversarial perturbation.\n",
        "*  **`adv_grad_norm`**: The norm to measure the magnitude of adversarial\n",
        "perturbation.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "iOc8YdmIRSHo"
      },
      "outputs": [],
      "source": [
        "class HParams(object):\n",
        "  def __init__(self):\n",
        "    self.input_shape = [28, 28, 1]\n",
        "    self.num_classes = 10\n",
        "    self.conv_filters = [32, 64, 64]\n",
        "    self.kernel_size = (3, 3)\n",
        "    self.pool_size = (2, 2)\n",
        "    self.num_fc_units = [64]\n",
        "    self.batch_size = 32\n",
        "    self.epochs = 5\n",
        "    self.adv_multiplier = 0.2\n",
        "    self.adv_step_size = 0.2\n",
        "    self.adv_grad_norm = 'infinity'\n",
        "\n",
        "HPARAMS = HParams()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "72zL1AMcYYGG"
      },
      "source": [
        "## MNIST dataset\n",
        "\n",
        "The [MNIST dataset](http://yann.lecun.com/exdb/mnist/) contains grayscale\n",
        "images of handwritten digits (from '0' to '9'). Each image shows one digit at\n",
        "low resolution (28-by-28 pixels). The task involved is to classify images into\n",
        "10 categories, one per digit.\n",
        "\n",
        "Here we load the MNIST dataset from\n",
        "[TensorFlow Datasets](https://www.tensorflow.org/datasets). It handles\n",
        "downloading the data and constructing a `tf.data.Dataset`. The loaded dataset\n",
        "has two subsets:\n",
        "\n",
        "*   `train` with 60,000 examples, and\n",
        "*   `test` with 10,000 examples.\n",
        "\n",
        "Examples in both subsets are stored in feature dictionaries with the following\n",
        "two keys:\n",
        "\n",
        "*   `image`: Array of pixel values, ranging from 0 to 255.\n",
        "*   `label`: Groundtruth label, ranging from 0 to 9."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "R1dK6E4axNHB"
      },
      "outputs": [],
      "source": [
        "datasets = tfds.load('mnist')\n",
        "\n",
        "train_dataset = datasets['train']\n",
        "test_dataset = datasets['test']\n",
        "\n",
        "IMAGE_INPUT_NAME = 'image'\n",
        "LABEL_INPUT_NAME = 'label'"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "IBkh4mbsxLR_"
      },
      "source": [
        "To make the model numerically stable, we normalize the pixel values to [0, 1]\n",
        "by mapping the dataset over the `normalize` function. After shuffling training\n",
        "set and batching, we convert the examples to feature tuples `(image, label)`\n",
        "for training the base model. We also provide a function to convert from tuples\n",
        "to dictionaries for later use."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "VhMEJqKs0_7z"
      },
      "outputs": [],
      "source": [
        "def normalize(features):\n",
        "  features[IMAGE_INPUT_NAME] = tf.cast(\n",
        "      features[IMAGE_INPUT_NAME], dtype=tf.float32) / 255.0\n",
        "  return features\n",
        "\n",
        "def convert_to_tuples(features):\n",
        "  return features[IMAGE_INPUT_NAME], features[LABEL_INPUT_NAME]\n",
        "\n",
        "def convert_to_dictionaries(image, label):\n",
        "  return {IMAGE_INPUT_NAME: image, LABEL_INPUT_NAME: label}\n",
        "\n",
        "train_dataset = train_dataset.map(normalize).shuffle(10000).batch(HPARAMS.batch_size).map(convert_to_tuples)\n",
        "test_dataset = test_dataset.map(normalize).batch(HPARAMS.batch_size).map(convert_to_tuples)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "JrrMpPNmpCKK"
      },
      "source": [
        "## Base model\n",
        "\n",
        "Our base model will be a neural network consisting of 3 convolutional layers\n",
        "follwed by 2 fully-connected layers (as defined in `HPARAMS`). Here we define\n",
        "it using the Keras functional API. Feel free to try other APIs or model\n",
        "architectures (e.g. subclassing). Note that the NSL framework does support all three types of Keras APIs."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "4UjrtuIsYWo3"
      },
      "outputs": [],
      "source": [
        "def build_base_model(hparams):\n",
        "  \"\"\"Builds a model according to the architecture defined in `hparams`.\"\"\"\n",
        "  inputs = tf.keras.Input(\n",
        "      shape=hparams.input_shape, dtype=tf.float32, name=IMAGE_INPUT_NAME)\n",
        "\n",
        "  x = inputs\n",
        "  for i, num_filters in enumerate(hparams.conv_filters):\n",
        "    x = tf.keras.layers.Conv2D(\n",
        "        num_filters, hparams.kernel_size, activation='relu')(\n",
        "            x)\n",
        "    if i \u003c len(hparams.conv_filters) - 1:\n",
        "      # max pooling between convolutional layers\n",
        "      x = tf.keras.layers.MaxPooling2D(hparams.pool_size)(x)\n",
        "  x = tf.keras.layers.Flatten()(x)\n",
        "  for num_units in hparams.num_fc_units:\n",
        "    x = tf.keras.layers.Dense(num_units, activation='relu')(x)\n",
        "  pred = tf.keras.layers.Dense(hparams.num_classes)(x)\n",
        "  model = tf.keras.Model(inputs=inputs, outputs=pred)\n",
        "  return model"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "288nsmN5pLoo"
      },
      "outputs": [],
      "source": [
        "base_model = build_base_model(HPARAMS)\n",
        "base_model.summary()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "mlTUGn1t_HAr"
      },
      "source": [
        "Next we train and evaluate the base model."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "K2cFDbmRpRMp"
      },
      "outputs": [],
      "source": [
        "base_model.compile(\n",
        "    optimizer='adam',\n",
        "    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),\n",
        "    metrics=['acc'])\n",
        "base_model.fit(train_dataset, epochs=HPARAMS.epochs)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "J94Y_WTaqAsi"
      },
      "outputs": [],
      "source": [
        "results = base_model.evaluate(test_dataset)\n",
        "named_results = dict(zip(base_model.metrics_names, results))\n",
        "print('\\naccuracy:', named_results['acc'])"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "c8OClWqGALIm"
      },
      "source": [
        "We can see that the base model achieves 99% accuracy on the test set. We will\n",
        "see how robust it is in\n",
        "[Robustness Under Adversarial Perturbations](#scrollTo=HXK9MGG8lBX3) below."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "CemXA8N9q336"
      },
      "source": [
        "## Adversarial-regularized model\n",
        "\n",
        "Here we show how to incorporate adversarial training into a Keras model with a\n",
        "few lines of code, using the NSL framework. The base model is wrapped to create\n",
        "a new `tf.Keras.Model`, whose training objective includes adversarial\n",
        "regularization."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "YUOpl-rkzRrY"
      },
      "source": [
        "First, we create a config object with all relevant hyperparameters using the\n",
        "helper function `nsl.configs.make_adv_reg_config`."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "-WWVwJB2qstE"
      },
      "outputs": [],
      "source": [
        "adv_config = nsl.configs.make_adv_reg_config(\n",
        "    multiplier=HPARAMS.adv_multiplier,\n",
        "    adv_step_size=HPARAMS.adv_step_size,\n",
        "    adv_grad_norm=HPARAMS.adv_grad_norm\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "OmeIUyxE4s68"
      },
      "source": [
        "Now we can wrap a base model with `AdversarialRegularization`. Here we create a\n",
        "new base model (`base_adv_model`), so that the existing one (`base_model`) can\n",
        "be used in later comparison.\n",
        "\n",
        "The returned `adv_model` is a `tf.keras.Model` object, whose training objective\n",
        "includes a regularization term for the adversarial loss. To compute that loss,\n",
        "the model has to have access to the label information (feature `label`), in\n",
        "addition to regular input (feature `image`). For this reason, we convert the\n",
        "examples in the datasets from tuples back to dictionaries. And we tell the\n",
        "model which feature contains the label information via the `label_keys`\n",
        "parameter."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "TObqJLEX4sQq"
      },
      "outputs": [],
      "source": [
        "base_adv_model = build_base_model(HPARAMS)\n",
        "adv_model = nsl.keras.AdversarialRegularization(\n",
        "    base_adv_model,\n",
        "    label_keys=[LABEL_INPUT_NAME],\n",
        "    adv_config=adv_config\n",
        ")\n",
        "\n",
        "train_set_for_adv_model = train_dataset.map(convert_to_dictionaries)\n",
        "test_set_for_adv_model = test_dataset.map(convert_to_dictionaries)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "aKTQWzfj7JvL"
      },
      "source": [
        "Next we compile, train, and evaluate the\n",
        "adversarial-regularized model. There might be warnings like\n",
        "\"Output missing from loss dictionary,\" which is fine because\n",
        "the `adv_model` doesn't rely on the base implementation to\n",
        "calculate the total loss."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "aTSK-cHbuWDw"
      },
      "outputs": [],
      "source": [
        "adv_model.compile(\n",
        "    optimizer='adam',\n",
        "    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),\n",
        "    metrics=['acc'])\n",
        "adv_model.fit(train_set_for_adv_model, epochs=HPARAMS.epochs)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "3v_Jn7wuviZx"
      },
      "outputs": [],
      "source": [
        "results = adv_model.evaluate(test_set_for_adv_model)\n",
        "named_results = dict(zip(adv_model.metrics_names, results))\n",
        "print('\\naccuracy:', named_results['sparse_categorical_accuracy'])"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "LgnslZYk9Acg"
      },
      "source": [
        "We can see that the adversarial-regularized model also performs very well (99%\n",
        "accuracy) on the test set."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "HXK9MGG8lBX3"
      },
      "source": [
        "## Robustness under Adversarial perturbations\n",
        "\n",
        "Now we compare the base model and the adversarial-regularized model for\n",
        "robustness under adversarial perturbation.\n",
        "\n",
        "We will use the `AdversarialRegularization.perturb_on_batch` function for\n",
        "generating adversarially perturbed examples. And we would like the generation\n",
        "based on the base model. To do so, we wrap the base model with\n",
        "`AdversarialRegularization`. Note that as long as we don't invoke training (`Model.fit`), the learned variables in the model won't change and the model is\n",
        "still the same one as in section [Base Model](#scrollTo=JrrMpPNmpCKK)."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "FLkYw54pvxJO"
      },
      "outputs": [],
      "source": [
        "reference_model = nsl.keras.AdversarialRegularization(\n",
        "    base_model, label_keys=[LABEL_INPUT_NAME], adv_config=adv_config)\n",
        "reference_model.compile(\n",
        "    optimizer='adam',\n",
        "    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),\n",
        "    metrics=['acc'])"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "DR0Rn5rxBeDh"
      },
      "source": [
        "We collect in a dictionary the models to be evaluted, and also create a metric\n",
        "object for each of the models.\n",
        "\n",
        "Note that we take `adv_model.base_model` in order to have the same input format\n",
        "(not requiring label information) as the base model. The learned variables in\n",
        "`adv_model.base_model` are the same as those in `adv_model`."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "igRBxPlPm_JE"
      },
      "outputs": [],
      "source": [
        "models_to_eval = {\n",
        "    'base': base_model,\n",
        "    'adv-regularized': adv_model.base_model\n",
        "}\n",
        "metrics = {\n",
        "    name: tf.keras.metrics.SparseCategoricalAccuracy()\n",
        "    for name in models_to_eval.keys()\n",
        "}"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "BAPYegAbC8mZ"
      },
      "source": [
        "Here is the loop to generate perturbed examples and to evaluate models with\n",
        "them. We save the perturbed images, labels, and predictions for visualization\n",
        "in the next section."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "IGnLXhswmUN8"
      },
      "outputs": [],
      "source": [
        "perturbed_images, labels, predictions = [], [], []\n",
        "\n",
        "for batch in test_set_for_adv_model:\n",
        "  perturbed_batch = reference_model.perturb_on_batch(batch)\n",
        "  # Clipping makes perturbed examples have the same range as regular ones.\n",
        "  perturbed_batch[IMAGE_INPUT_NAME] = tf.clip_by_value(\n",
        "      perturbed_batch[IMAGE_INPUT_NAME], 0.0, 1.0)\n",
        "  y_true = perturbed_batch.pop(LABEL_INPUT_NAME)\n",
        "  perturbed_images.append(perturbed_batch[IMAGE_INPUT_NAME].numpy())\n",
        "  labels.append(y_true.numpy())\n",
        "  predictions.append({})\n",
        "  for name, model in models_to_eval.items():\n",
        "    y_pred = model(perturbed_batch)\n",
        "    metrics[name](y_true, y_pred)\n",
        "    predictions[-1][name] = tf.argmax(y_pred, axis=-1).numpy()\n",
        "\n",
        "for name, metric in metrics.items():\n",
        "  print('%s model accuracy: %f' % (name, metric.result().numpy()))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "S5cC3XbRGFJQ"
      },
      "source": [
        "We can see that the accuracy of the base model drops dramatically (from 99% to\n",
        "about 50%) when the input is perturbed adversarially. On the other hand, the\n",
        "accuracy of the adversarial-regularized model only degrades a little (from 99%\n",
        "to 95%). This demonstrates the effectiveness of adversarial learning on\n",
        "improving model's robustness."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "YfB5oBBfWLRK"
      },
      "source": [
        "## Examples of adversarially-perturbed images\n",
        "\n",
        "Here we take a look at the adversarially-perturbed images. We can see that the\n",
        "perturbed images still show digits recognizable by human, but can successfully\n",
        "fool the base model."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "3iK9vO_xKJfg"
      },
      "outputs": [],
      "source": [
        "batch_index = 0\n",
        "\n",
        "batch_image = perturbed_images[batch_index]\n",
        "batch_label = labels[batch_index]\n",
        "batch_pred = predictions[batch_index]\n",
        "\n",
        "batch_size = HPARAMS.batch_size\n",
        "n_col = 4\n",
        "n_row = (batch_size + n_col - 1) // n_col\n",
        "\n",
        "print('accuracy in batch %d:' % batch_index)\n",
        "for name, pred in batch_pred.items():\n",
        "  print('%s model: %d / %d' % (name, np.sum(batch_label == pred), batch_size))\n",
        "\n",
        "plt.figure(figsize=(15, 15))\n",
        "for i, (image, y) in enumerate(zip(batch_image, batch_label)):\n",
        "  y_base = batch_pred['base'][i]\n",
        "  y_adv = batch_pred['adv-regularized'][i]\n",
        "  plt.subplot(n_row, n_col, i+1)\n",
        "  plt.title('true: %d, base: %d, adv: %d' % (y, y_base, y_adv))\n",
        "  plt.imshow(tf.keras.utils.array_to_img(image), cmap='gray')\n",
        "  plt.axis('off')\n",
        "\n",
        "plt.show()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "g_vo1pWYJlHP"
      },
      "source": [
        "## Conclusion\n",
        "\n",
        "We have demonstrated the use of adversarial learning for image classification\n",
        "using the Neural Structured Learning (NSL) framework. We encourage users to\n",
        "experiment with different adversarial settings (in hyper-parameters) and to see\n",
        "how they affect model robustness."
      ]
    }
  ],
  "metadata": {
    "accelerator": "GPU",
    "colab": {
      "collapsed_sections": [],
      "name": "Adversarial regularization for image classification",
      "private_outputs": true,
      "provenance": [],
      "toc_visible": true
    },
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
